Business Law
The vultures are circling for Chrome
Google has a monopoly, and that's the official line of the US federal government. In fact, it has two of them, losing two separate antitrust cases that threaten to cripple the tech giant. The Department of Justice has proposed forcing Google to sell or otherwise divest itself of the Chrome browser as its first and preferred remedy. But who would buy it? Unsurprisingly, there are beaucoup business beaus lining up around the block for this browser bachelorette.
Google pays Samsung an 'enormous' amount of money to pre-install Gemini on phones
Google has been paying Samsung tons of cash every month to pre-install the AI app Gemini on its smartphones, according to a report by Bloomberg . This information comes to us as part of a pre-existing antitrust case against Google. Peter Fitzgerald, Google's VP of platforms and device partnerships, testified in federal court that it began paying Samsung for this service back in January. The pair of companies have a contract that's set to run at least two years. Fitzgerald told Judge Amit Metha, who is overseeing the case, that Google provides Samsung with both fixed monthly payments and a percentage of revenue earned from advertisers within the Gemini app.
BeanCounter: A low-toxicity, large-scale, and open dataset of business-oriented text
Many of the recent breakthroughs in language modeling have resulted from scaling effectively the same model architecture to larger datasets. In this vein, recent work has highlighted performance gains from increasing training dataset size and quality, suggesting a need for novel sources of large-scale datasets. In this work, we introduce BeanCounter, a public dataset consisting of more than 159B tokens extracted from businesses' disclosures. We show that this data is indeed novel: less than 0.1% of BeanCounter appears in Common Crawl-based datasets and the data is an order of magnitude larger than datasets relying on similar sources. Given the data's provenance, we hypothesize that BeanCounter is comparatively more factual and less toxic than web-based datasets. Exploring this hypothesis, we find that many demographic identities occur with similar prevalence in BeanCounter but with significantly less toxic context relative to other datasets. To demonstrate the utility of BeanCounter, we evaluate and compare two LLMs continually pre-trained on BeanCounter with their base models. We find an 18-33% reduction in toxic generation and improved performance within the finance domain for the continually pretrained models. Collectively, our work suggests that BeanCounter is a novel source of low-toxicity and high-quality domain-specific data with sufficient scale to train multi-billion parameter LLMs.
So You Bought a Humane Ai Pin. Here's What You Can Do Next
As of today, the Humane Ai Pin is dead--less than a year since its launch. Following an acquisition by HP, Humane shut down many of the core features of the artificial intelligence-powered wearable and deleted user data, rendering it useless. Yes, some functions remain, like checking battery life (useful!), but you can't access the voice assistant. If you spent 700 on the Ai Pin, you might be wondering what you can do now. These are the risks of being an early adopter, but not getting a refund on a device bricked before the warranty is even up feels like a rip-off.
Educational tech company Chegg sues Google over AI Overviews
Educational tech company Chegg has sued Google in federal court claiming that its "AI Overviews" that appear ahead of search results have hurt its traffic and revenue. In order to be included in Google's search results, Chegg alleges, it must "supply content that Google republishes without permission in AI-generated answers that unfairly compete for the attention of users on the internet in violation of antitrust laws of the United States." However, Chegg is taking another approach, instead accusing Google of abusing its monopoly position to force companies to supply materials for its "AI Overviews" on its search page. Failing to do so, it says, means it could effectively be excluded from Google Search altogether. Chegg included a screenshot of a Google AI Overview that takes details from Chegg's website without attribution, though the page in question appears lower down in the search results.
HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings
Aavang, Rasmus, Rizzi, Giovanni, Bรธggild, Rasmus, Iolov, Alexandre, Zhang, Mike, Bjerva, Johannes
The U.S. Securities and Exchange Commission (SEC) requires that public companies file financial reports tagging numbers with the machine readable inline eXtensible Business Reporting Language (iXBRL) standard. However, the highly complex and highly granular taxonomy defined by iXBRL limits label transferability across domains. In this paper, we introduce the Hierarchical Financial Key Performance Indicator (HiFi-KPI) dataset, designed to facilitate numerical KPI extraction at specified levels of granularity from unstructured financial text. Our approach organizes a 218,126-label hierarchy using a taxonomy based grouping method, investigating which taxonomy layer provides the most meaningful structure. HiFi-KPI comprises ~1.8M paragraphs and ~5M entities, each linked to a label in the iXBRL-specific calculation and presentation taxonomies. We provide baselines using encoder-based approaches and structured extraction using Large Language Models (LLMs). To simplify LLM inference and evaluation, we additionally release HiFi-KPI Lite, a manually curated subset with four expert-mapped labels. We publicly release all artifacts.
LegalBench.PT: A Benchmark for Portuguese Law
Canaverde, Beatriz, Pires, Telmo Pessoa, Ribeiro, Leonor Melo, Martins, Andrรฉ F. T.
The recent application of LLMs to the legal field has spurred the creation of benchmarks across various jurisdictions and languages. However, no benchmark has yet been specifically designed for the Portuguese legal system. In this work, we present LegalBench.PT, the first comprehensive legal benchmark covering key areas of Portuguese law. To develop LegalBench.PT, we first collect long-form questions and answers from real law exams, and then use GPT-4o to convert them into multiple-choice, true/false, and matching formats. Once generated, the questions are filtered and processed to improve the quality of the dataset. To ensure accuracy and relevance, we validate our approach by having a legal professional review a sample of the generated questions. Although the questions are synthetically generated, we show that their basis in human-created exams and our rigorous filtering and processing methods applied result in a reliable benchmark for assessing LLMs' legal knowledge and reasoning abilities. Finally, we evaluate the performance of leading LLMs on LegalBench.PT and investigate potential biases in GPT-4o's responses. We also assess the performance of Portuguese lawyers on a sample of questions to establish a baseline for model comparison and validate the benchmark.
Why OpenAI is trying to untangle its 'bespoke' corporate structure
On the Friday after Christmas, OpenAI published a blog post titled "Why OpenAI's structure must evolve to advance our mission." In it, the company detailed a plan to reorganize its for-profit arm into a public benefit corporation (PBC). In the weeks since that announcement, I've spoken to some of the country's leading corporate law experts to gain a better understanding of OpenAI's plan, and, more importantly, what it might mean for its mission to build safe artificial general intelligence (AGI). "Public benefit corporations are a relatively recent addition to the universe of business entity types," says Jens Dammann, professor of corporate law at the University of Texas School of Law. Depending on who you ask, you may get a different history of PBCs, but in the dominant narrative, they came out of a certification program created by a nonprofit called B Lab. Companies that complete a self-assessment and pay an annual fee to B Lab can carry the B Lab logo on their products and websites and call themselves B-Corps.